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ABSTRACT . . ^ u 

The accountability movement in American education ha* 
received great clamors of attention in the past few years. This 
movement, in turn, suggests the need for particuxar data to inform 
responsible decision-making; the need for assessment instruments 
which can address the question of what is learned; and the need for 
strengthening the existing ecosystem between schools, universities, 
the public, and the governmeht agencies concerned with education. 
Specifically, problems within the context of reading instruction and 
assessment are discussed. Assessing students in relation to a ^ 
criterion of mastery; pointing out existing weaknesses in specifying 
objectives, selecting and designing evaluation instruments, 
interpreting evaluation data and improving instructional methodology 
are advocated strongly. (Author/DEP) 
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S. Jay Samuels Glenace E. Edwall 

ABOUT THIS REPORT 

The accountability movement in American 
education has received great clamors of attention 
in the past few years. Dr. S. Jay Samuels, Professor 
of Educational Psychology, and Glenace E. Edwall 
at the University of Minnesota discuss the prob 
lems within the context of reading instruction and 
assessment. They strongly advocate assessing stu 
dents in relation to a criterion of mastery; point 
out existing weaknesses in specifying objectives, 
selecting and designing evaluation instruments, 
interpreting evaluation data and improving instruc 
tional meiiiodology. Positive suggestions are 
offered for strengthening the existing ecosystem 
between our schools, universities, the public and 
the government agencies concerned with educa- 
tion. 

Dr. Samuels is Director of the Minnesota Read- 
ing Research Projects and has done extensive re- 
search in reading acquisition. 

Glenace Edwall is a Fellow in the Center for Re- 
search m Human Learnmg, is involved in research 
in the reading project,, and consults on program 
^"^'uation locally. 
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S. Jay Samuels and Glenace E. Edwall 

University of Minnesota 

Evaluation in American education has entered a new 
era in recent years with the clamor of public interest 
groups for information about their schools, and the 
recognition by taxpayers and educators alike that the 
school is in some sense accountable for the products of 
the system. The Minneapolis Citizens League, typifing 
the community movements for educational account- 
ability, notes 

We have come to understand that there are two 
primary clients to be served by our public 
schools.: society, and the individual student (and/ 
or the family as spokesman or guardian of the 
student), , . . Both society at large and the indi- 
vidual student's family have a very legitimate 
claim on public education. . . . 77?^ prevalent 
form or level of accountability is no longer 
accepted as being sufficient. Increasingly, the con- 
straints of limited resources coupled with the 
desire for excellence has generated a demand for a 
results-oriented system of accountability. Put 
simply, people want to know what outcomes are 
achieved by the expenditure of educational re- 
sources. At the same time, many parents and stu- 
dents are dissatisfied with the degree to which 
their school system is responsive to them J 

As this call implies, the type of "accountability" 
that has traditionally been operative, i.e., answering 
specific charges of parents when difficulties arise and 
informing the public'only of "input information" such 
as teacher preparation and budget allocations for 
equipment and buildings,^ is not sufficient for dealing 

•Support for this paper was proVKk»d by yrants to the Center for 
Research in Human LtMrninci (NSF. NICHD, The University Graduate 
School), the Minnesota Reading Research Project (NICHD) and the 
Research & Development COnter for the Eduration.il I y Handicapped 
(BEk 



with the centnJ question of the accountability move 
ment, which is, '""Vhdt is leerned?"' To approach an- 
swering that question, wo would suggest, demands that 
the schools have the tools for making responsible deci- 
sions in response to the needs of society and individual 
students, and these tools include the specification of 
objectives, means of assessment commensurate with 
those objectives, and the development of available 
channels for effecting changes in and upgrading the 
quality of instruction. As we shall argue, currently 
these tools are largely not employed to the best advan- 
tage: at all levels, objectives are less than clear, Jhe 
tests most frequently used are not directed to answer- 
ing the what is learned question, and thus cannot pro- 
vide information for decision-making; and the ties 
between the school districts and the resources of uni- 
versity colleges of education are tenuous. Changes in 
relation to these needs must be made in order that the 
schools can be accountable to the public which is their 
Support and to the students who are their responsi- 
bility. 



"Our testing is largely 
misused and non-functional. " 



The Problem of Assessment 
It IS the problem of assessment which is central to 
the demand for accountability,, for it is only to the 
extent that educators can measure learning (and lack 
of such) by their students that they can repori to the 
public. Although no country in the world currently 
employs standardized achievement testing to the 
degree of the U.S., paradoxically we still do not have 
answers tc basic questions of what students have 
learned. Our testing is largely misused and non- 
functional. 

The misuse of standardized tests has become com- 
monplace, fanning the anti-test movement which has 
resulted in court decisions barring the administration 
of familiar achievement/intelligence tests in some 
areas. Farr and Roser outline the most frequent mis- 
uses of such standardized tests as including test admin- 
istration without a cleorly stated purpose; use of tests 
related to specific goals for the assessment of global 
objectives; use of test results as the sole criterion for 
vudcjing an educational program,^ improper release and 
.interpretation of lesults; and use of tests as classifying 
• tools for a rigidly-tracked, labeled system of educa 
^tional stratification. Farr and Roser conclude. 

Taken together, these five misuses of tests cmd 
test results c)re justifiably significant evidence for 
universal opposition to the continual 'misuse of 
tests' in our nation's schools ^ 
^ Despite the justifiable outrage against misuse of 
pnVpu^sts, however, means of assessment continue to be 
i^^[|omanded. Thus the more- serious charge against the 



tests IS that even with appropriate safeguards in admin- 
istration and interpretation, the tests most widely used 
may have no relation to the objectives of instruction 
and are therefore non-functional (and, chus, in the 
extreme case,, may actually be severely damaging to an 
individual student's educational progress and society's 
nght to know the outcomes of education). 



' . . norm-referenced tests . . . 

are not of value . . . for 
responsible decision-making/' 



The essence of the problem of assessment instru- 
ments such as standardized reading achievement tests 
lies in norm-referencing. The most commonly used 
reading achievement tests, sample items of which we 
shall examine in some detail, are norm-referenced tests; 
the data that can be provided from such tests is of a 
comparative nature only and cannot answer the what is 
learned question or provide direction for decision-: 
makers. 

The philosophy and beginning use of norm- 
referenced tests cane from an era when classification 
of individuals was the raison d'etre of testing; early 
intelligence tests, as is well known, were designed for 
the purpose of separating those who could benefit 
from the contemporary educational system from those 
who could not, and testing found an early function, 
and expansion in making similar classification decisions 
for industry and the military. It is immediately obvious 
that the purpose of these tests and the form of the 
scores they reported was for determining who was 
better than whom at whatever the test ostensibly 
measured. 

When such tests are taken into the classroom the 
charges which can be leveled against the practice are 
legion. Because of the emphasis on placement of one 
person m relation to others, it is the total score (or 
total scores for subtests) which is of primary impor- 
tance, and the tests thus provide no specific knowledge 
of students' competencies, no diagnostic information 
m terms of specific difficulties,, and no information 
which can guide decision-making relative to the 
school's responsibility to the individual student. 

More specifically, the emphasis on total score fre- 
quently results in the mixing of items which are gen- 
erally related to the labeled score but may be based on 
rather different abilities, e.g. literal cmd inferential 
comprehension items in reading achievement tests. Not 
wishing to single out any one publisher's test for the 
kind of generic problem commonly found in numerous 
commercial reading tests, we offer the following hypo- 
thetical example' 

Lost in the Woods 

John and Bill carefully slid their boat onto the 
muddy land. They jumped ashore followed by 
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their dog. For hours the boys and their dog 
A/vandered through the woods looking for the 
beaver pond. As the sun started to set John and 
Bill became aware they were lost. Bill called the 
dog and told him to go back to the boat. The dog 
sniffed at the trail as he ran through the woods 
and in a short time led them back to the boat. 

Comprehension Questions: 

/. In the woods, the dog followed the trail by 

a, sight 

b, smell 

c, touch 

d, sounds 

2, The boys were in the woods looking for a 

a, lost child 

b, lost dog 

c. beaver pond 

d. fake 

3, John and Bill were 

a. glad their dog was along 
b„ brothers 

c. much too young to be hiking in the woods 

d, foolish to cross the lake in a boat 

There are several problems with these comprehen- 
sion questions. Question one can be answered without 
having read the paragraph, by means of general knowl- 
edge of canine behavior. Question two measures literal 
comprehension of information contained in thiB pas- 
sage. Question three measures inferential corVipre- 
hension since information about the correct answer is 
not contained directly in the paragraph and the stu- 
dent taking the test must make an inference about the 
response alternative which is most probably correct 

If a student fails to answer question three correctly 
one cannot easily diagnose the nature of the problem. 
The student may be able to iriterpret literally what he 
reads but be poor m inference reasoning. On the other 
hdnd,nhe student's problem may be failure in literal 
comprehension which would, of course, prevent him 
from reasoning from the information provided in the 
paragraph. 

As is often the case in norm-referenced tests, a single 
comprehension score is assigned to the student. This 
score provides virtually no diagnostic information 
about the student's reading strengths and weaknesses. 
The score is primarily useful in comparing the student 
to others of similar age, but this score is of lim^jd 
usefulness. At the very least, norm-referenced tests 
should provide two comprehension scores for each 
student, one tor literal and the other for inferential 
comprehension. 

In that the items which are retained through the 
development of the norm-referenced test are those 
which are highest in predictive validity, they may 
actually be low m content validity "good" items are 
those which 50 per cent of the test takers can pass, but 
Items which are most useful m such a discrimina- 
at^o not the most highly related to instructional 
ifejfaa Is (presumably hioh piiority material which was 



taught well should be passed by 90 per cent of 
students). 

As Ralph Tyler sums this problem. 

These tests thus provide dependable information 
about where the child stands in his total test 
performance in relation to the norm group. But 
v/hen one seeks to find out whether a student 
who made a low score has learned certain things 
during the year, the test does not include enough 
questions covering the material on which he was 
working to furnish a dependable answer to that 
question, 

One may conclude, then, that the norm-referenced 
tests widely employed to gauge educational progress in 
reading and other basic skills areas do not approach the 
"what is learned" criterion for assessment, and thus are 
not of value tc the educator concerned about data for 
responsible decision-making. 

One Solution; 
The Move to Criterion-Referenced Testing 

Viewed in this way, the most obvious solution to 
the assessment problem is to design instruments sensi- 
tive to what students have learned rather than their 
relative achievement. This approach is termed criterion 
or domain-referenced testing, derived from the empha- 
sis which is placed on the individual student's standing 
in relation to a criterion of mastery of a given skill or 
subject matter. Put most simply, the criterion for test 
item selection in this model is mastery, rather than 
discrimination; "the test developer is not interested in 
the spread of performance but rather in how many 
students are able to perform well enough to pass the 
anchor point. "'^ It is our contention that such design 
can better serve educational decision-making needs. 



. . a more complex technological 
society cannot afford the luxury 
of educational rejects . . /' 



The conceptual framework for criterion referenced 
testing is based on the mastery model of instruction 
promulgated by Carroll, Bloom and others.^ The 
assumptions inherent in this model are simply that 
most children can learn the content of instruction, and 
that the societal, needs shaping educational and testing 
decisions are different than in the era of norm- 
referencing: in the basic skills areas with which we are 
primarily concerned, a more complex technological 
society cannot afford the luxury of educational rejects, 
and must insure that its children all have a mastery of 
necessary skills. Thus we assume that children can 
learn,, and our task in testing is measuring each child's 
progress toward a skill/knowledge criterion. 

The support for these assumptions and the move to 
criterion referenced testing comes from the models '^f 




educdtors such as Carroll and Bloom, and from re 
search work with populations presenting special in 
structional oroblems. Conceptually, the model pro 
posed by Ca.ToM 

. HKikcs it ciccir that if the students are nor 
mally (listribiited with respect to aptitude for 
some subject mid all the students are provided 
with exactly the same instruction (same in terms 
of amount of instruction,^ quality of instruction, 
and time available for learning),, the end result 
will be a normal distribution on an appropriate 
measure of achievement. . . . Conversely, if the 
students are normally distributed with respect to 
^ .ipntude, but the l<ind and quality of instruction 
EPJjC and the amount of time available for learning are 
made appropriate to the characteristics and needs 



of each student, the majority of students may be 
expected to achieve mastery of the subject. 



'\ /. ghetto children are learning 
to read , . . where reading and 
spelling instruction are emphasized 
and subskills are learned to mastery " 



That this model for mastery is in fact workable can 
be seen in research and reading programs designed for 
populations with special mstructioial problems; in a 
variety of situations, pupils who would be consigned 
on the basis of norms to "slow learner" or "low 
achieving" groups are being taught to read. Samuels 
and Dahl, reviewing a reading program in the Kansas 
City schools, concluded that ghetto children are learn- 
ing to read in schools with a philosophy of succe^ss 
where reading and spc'ling instruction are emphasised 
and subskills are learned to mastery;^ and a variety of 
community reading programs for special target popula- 
tions have been meeting success when "each program is 
designed around the needs of the students."'^ 

Recent research has found virtually no difference in 
performance on very simple learning tasks among 
individuals who seemingly differ considerably in IQ; 
Samuels and Anderson found no differences between 
two IQ groups of third graders in a simple associational 
learning task,^*^ and Zeaman and House found an 
attentional variable among three groups of retarded 
learners a discrimination learning task, but the three 
acquistion curves showed no differences once the 
critical features of the stimuli had been perceived.* * 
Other studies of basic learning ability have similarly 
found no difference in actual tests of learning perfor- 
mance among individuals who differed considerably m 
IQ.'^ 

If we are therefore convinced that most children c^n 
learn the content of instruction, and th.U it is desirable 
both for the student and for society that he do so, the 
purpose of testing is not assessing students' relative 
standing, but rather measuring their progress toward 
mastery of the unit of instruction in terms of a given 
criterion. Note that with this approach, if instructional 
objectives and the commensurate form of assessment 
are clearly defined and well designed, the data from 
testing are both specifically related to students' compe- 
tencies. I.e., attempts to answer the what is learned 
question, and can provioe diagnostic information in 
revealing the student's particular strengths and weak- 
nesses in a skill or knowledge area. This is so because 
items are specifically constructed with content validity 
as the most important criterion, as opposed to discrim- 
inatory or predictive abiluy. 

Concern with finding out what students know in a 
more absolute , sense has motivated a variety of 
mterior. referenced approaches to testing and an 
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emphasis on the content oi items and their relation to 
instructional objectives. The National Assessment of 
Educational Progress is the most far-reaching of such 
projects as a plan for a "systematic, census-like survey 
of knowledge, skills, understandings and attitudes 
designed to sample four age levels in ten different 
cubject areas," and the items developed by NAEP 
exhibit the concern with measuring what students 
know in relation to the objectives of instruction. 

7776 most important criterion which was estab- 
lished for exercise development was that every 
exercise must be a direct measure of some knowl- 
edge, skill or attitude which was stated in the 
objectives. That is, it must have content validity, 
. . . An exercise must be meaningful, make sense 
and be directly related to the objective. It must 
not be trivial, inconsequential or peripheral to the 
objective. 

State education agencies are initiating similar assess- 
ment programs for measuring pupil achievement 
through the development of educational objectives and 
construction of tests to measure how well these objec- 
tives are being met in current educational programs,' 
and criterion-referenced tests and guidelines for corre- 
sponding objectives are also beginning to be available 
commercially for classroom use. Thus, although the 
criterion-referenced approach is not without problems 
of its own, particularly in the time of development, it 
IS beginning to be usefully employed to provide infor- 
mation on the outputs of our instructional systems. 



. . the accountability movement 
has drawn attention to the need for 
stronger ties between the various 
parties involved in education: . . /' 
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In addition to the necessity of providing new forms 
of assessment, the accountability movement has also 
drawn attention to the need for developing stronger 
ties between the various parties involved in education: 
the schools, the universities' colleges of education, 
concerned citizenry, and relevant governmental 
agencies. 

The prest-nt system regarding these entities can well 
be termed a deteriorating ecological interaction. In the 
schools, 8s noted above, testing is often misused and 
non functional; test results often are not reported 
publicly; "research" consists of making compilations 
from the school's archives rather than either correla- 
tional study or experimentation; and, thus, with objec- 
tives not stated explicity and no public knowledge of 
assessment results, the factors necessary for account- 
^"^Mity are simply not presant. 

Voblems also exist in this ecosystem between the 
!ShooIs and colleges of education. The colleges of 
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education have a resource of experts in subject matter 
areas of relevancii to the schools, but these are chiefly 
concerned with tedctior preparation, graduate training, 
and research. Where service to school districts does 
occur, the expert is in the role of an entrepeneur; the 
system is also marked by few concerted efforts on the 
part of the schools to bring the educational experts 
directly to the task of improving the educational 
attainment of their students. As regards the state 
agencies concerned with education, few systematic 
attempts have been made to collect data on educa 
tional achievements 



'7a7 the schools . . . the factors 
necessary for accountability 
are simply not present, " 



Some changes,, however, are already being made,, 
fostered by the accountability movement, which are 
strengthening the interaction, we shall note these and 
add some suggestions: 

1. The state and the schools. Some state assessment 
in reading is now being done, using in part materials 
developed by the National Assessment of Educational 
Progress.'' Such data could also be collected for 
school systems by the state agency, through financial 
"piggy-backing"' on the state program by local school 
districts, using funds currently spent for standardized 
assessment instruments. In this way, data would 
become available for each school and age group in the 
school district, and could be used to inform both the 
public and decision-makers. Such a "piggy-back" pro- 
gram IS now being used in Richfield,, Minnesota 
schools, drawing on state assessment materials in 
rpddmg. 



{There are) few concerted efforts 
. . . to bring the educational 
experts directly to the task of 
improving . . . Educational 
attainment . . . of students/' 



2 Colleges of education and the schools: Although 
the type of interaction which would be optimal 
between these systems is prolonged,, systematic, devel- 
opmental interest in the schools by educational ex- 
perts, it must be realized that the university staff is 
simply not sufficiently large for such direct involve 
ment. Therefore, we would suggest that the staff could 
be best utilized in setting up model programs in 
^ ♦roubled school districts. These programs, in turn,^ may 
pnipjrvp as demonstration centers,, e.g., for reading in- 
ia^^truction, to a wider audience,, and would allow for the 



systematic collection of data on experimental methods 
and programs. Further, these programs could then be 
extended by especially training graduate students to 
implement them in school districts needing help; this 
practice would also provide future educator's with 
valuable applied experience. 

3. The schools and accountability. In order to have 
the data to become accountable,, as noted above, we 
have suggested that the schools need help in specifying 
objectives, selecting and designing evaluation instru- 
ments, interpreting the data of evaluation,, and design- 
ing and improving instructional methodology. We have 
suggested fundamental changes in calling for decision- 
making based on data collection for diagnostic infor- 
mation; for changes in evaluation and testing including 
the selection of texts and tests on the basis of the 
criterion/mastery model and participation in state 
assessment programs with "piggy backing" to obtain 
specific data on district schools; and for aid in instruc- 
tional design via closer ties with university colleges of 
education. To these suggestions we would add two 
further points: 



. . input from educators, 
subject matter specialists, and 
concerned citizens could be extremely 
helpful for school districts . . 



Centra! to the problem of accountability (once we 
can assume that at least some direction has been given 
to the measurement question) is the need for the 
school district, with the input of the citizens con- 
cerned, to specify the objectives for which account- 
ability'is held. A model for developing objectives can 
be found in California schools under the provisions of 
the Stull Bill,, where a mutually teacher- and principal-^ 
prepared plan evaluates teachers on the basis of the 
actual performance of pupils m achieving the formu- 
lated objectives.'^ The methodology of the National 
Assessment of Educational Progress in developing 
objectives may also provide a helpful guide for the 
schools: 



'\ . . there are no incentives . . . 
for teachers and administrators 
in the current system/' 



1. The obfectives must be satisfactory goals for 
each subject area as seen by subject matter 
specialists. 

2 The obfectives must be ono<; which currently 
are accepted as goals of American education 
by most schools. 



3. The objectives must be ones which are accept- 
able to thoughtful lay adults as reasonable ^ 
goals of American education 

This combination of input from educators, subject 
matter specialists, and concerned citizens could be 
extremely helpful for school districts,, both in clarify- 
ing and explicitly stating objectives, and in increasing 
interaction with educational professionals and lay 
people. 



. . informing the public 
only of ''input information' 
. . . /s not sufficient . . . 



Secondly, in the move for accountability, the 
schools must work toward the formulation of contin- 
gencies of reward on the basis of the meeting of 
responsibilities. It would seem that this is the essence 
of accountability, but in fact there are no incentives 
(other than personal) for teachers and administrators 
in the current system. Reinforcements (salary, other 
privileges) are delivered to educators at the present 
time in such a way as experimentation has shown 
results in low productivity. We would suggest that a 
better system to use at least in part is modeled on a 
schedule in which reinforcement is baseci on produc- 
tivity, resembling merit pay raises given government 
workers in some states, and the university system, 
where salary, tenure and rate of promotion are contin- 
gent on productivity. Making rewards contingent on 
performance has been shown to produce higher rates 
of response (performance), and m some casps, greater 
resistance to extinction (perseveration in the face of 
difficulty Jn educational parlance). 



. . a better system » . . 
is modeled on a schedule 
(where) reinforcement 
is based on productivity . . . 



Conchjsion 

It has been our purpose to take serious note of the 
move for jccountability in American educdtional eval 
uation. Tf.io movement, in turn, suggests the need for 
particular data to inform responsible decision-making, 
the need for assessment instruments which can address 
the what is learned question; and the need for 
^rengthening the existing ecosystem between our 
^ ools, universities, the public and the government 
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icies concerned with education. 
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